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A method for estimating return values from ensembles of forecasts at advanced lead 

times is presented. Return values of significant wave height in the North-East Atlantic, 

' ^ ' ■ the Norwegian Sea and the North Sea are computed from archived +240-h forecasts of 

Dh' the ECMWF ensemble prediction system (EPS) from 1999 to 2009. 

O ■ We make three assumptions: First, each forecast is representative of a six-hour inter- 

^ [ val and collectively the data set is then comparable to a time period of 226 years. Second, 

. y I the model climate matches the observed distribution, which we confirm by comparing 

^, w^ith buoy data. Third, the ensemble members are sufficiently uncorrelated to be con- 

rl , 
'q^, sidered independent realizations of the model climate. We find anomaly correlations of 

0.20, but peak events (> P97) are entirely uncorrelated. By comparing return values from 

individual members with return values of subsamples of the data set w^e also find that 
■^ ■ the estimates follow the same distribution and appear unaffected by correlations in the 

-y-K ■ ensemble. The annual mean and variance over the 11-year archived period exhibit no 

significant departures from stationarity compared with a recent reforecast, i.e., there is 

no spurious trend due to model upgrades, 
cn I EPS yields significantly higher return values than ERA-40 and ERA-Interim and is 

in good agreement with the high-resolution hindcast NORAIO, except in the lee of un- 

i^ , resolved islands where EPS overestimates and in enclosed seas where it is biased low. 

;_( , Confidence intervals are half the width of those found for ERA-Interim due to the mag- 



o 



c^ 



nitude of the data set. 
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1 Introduction 



Extreme value estimates of atmospheric and oceanographic variables are usually derived 
from observational records or from model reconstructions of the past (reanalyses and 
hindcasts). 

A given probability of exceedance or equivalently return period corresponds to a re- 
turn value of the geophysical variable in question. This return value is normally approx- 
imated by fitting the Generalized Extreme Value (GEV) distribution to blocked maxima 
(such as annual maxima) or by fitting the G ener alized Pareto (GP) distribution to ex- 
ceedances above a threshold, see lSmithI (|l990l ) and lColesI (|200l[ ). In t he atmospheric and 
ocean ographic sciences 100-year return values are usually sought ( Lopatoukhin et alj. 
20001 ). which means that observational or modeled time series are rarely lo ng enough to 



cove r the return period, even for the longe st reanalyses and hindcasts (see lUppala et al. 



2005; 



Kalnay et al. 



1996; 



Wang et al. 



12012 



for descriptions of some recent reanalyses). 
Extrapolation of the parametric fit to lower probabilities of exceedance (return periods 
longer than the observational record or modeled time series) is then required. This af- 
fects the confidence intervals of the return v alue estimates and is a concern when using 



shorter records like altimeter measurements ( Alves and Youn; 



2003; 



Young et al.l.l201lh 



Model or observational bias will further increase the confidence intervals, but is much 
harder to identify than the unsystematic error stemming from insufficient length of the 
time series. 

Trends and low-frequency oscillations can seriously influence return val ue estimates 
from time series. This can be handled using non-stationary techn iques ( see 



Chapter 6 for an introduction). Due to imminent climate change (|IPCC , 



2001 



200: 



Coles 

Tj), estimat 



ing return valu es from time series with tr ends has recently received soin e attention in the 



earth sciences. 



Kharin and Zw^iersl (|2000l ) and iKharin and ZwiersI pOOSh inve stigated the 



impac t of a linear trend on the GEV distribution of the annual extremes w^hile lFarey et al 



POOTI ) looked at extending the extreme value theory to assess the return values of tem- 
perature extremes i n the presence of a l inear trend over a 54-year period for French 
observing stations. 



de Winter et al 



l3 



( 20121) investigated the changing wave extremes in 



a regional c 



imate pr ojection of the North S ea for the time-slice 2071-2100. Similarly, 



Wang et al.l (|200J) and I Wang and Swaill (|2006h investigated the impact of changing wave 



climate on wave extremes in the span of the 21st century using statistical projections and 
coupled climate models. 

Even if non-stationarity can be handled it raises the question of what exactly the re- 
turn value estimates are to be used for. If a probability of exceedan ce valid for a certain 
time period is required, similar to what iKharin and Zw^iersl l2000l did for 21-year time 



slices from climate projections considered sufficiently stationary, then a long time series 
is not necessarily of much interest. What is then needed is an estimate of exceedance 
levels for that given time slice. Such a repository of possible w^eather realizations does 
in fact exist. The ensemble prediction system (EPS) operated by the European Centre 
for Medium-Range Weather Forecasts (ECMWF) has now been in operation for 20 years 



Molteni et al. 



1996; 



Buizza et al. , 



2007; 



Hagedorn et al.l. 120081 ). The individual ensem- 



ble members start from almost ide ntical initial conditions with only small perturbati ons 



added to the best guess analysis (|Buizza et al. 



1999; 



Leutbecher and Palmer!. 12008 ) to 



spread the ensemble in a way representative of the uncertainty of the forecast system. 



Altho ugh there is considerable forecast skill after a lead time of five days ([Richardson, 
20101 ). the skill drops rapidly after day six, and on day 10 the individual members are 
only weakly correlated with each other and w^ith observations, as we will show in Sec|2l 
If the quantiles of the entire cumulative distribution of the ensemble compare well with 
observations then the forecasts can be considered random realizations of a realistic model 
climate. 

Return values for significant wave height have been estimated from a wide varietv 
of da ta s ources in the past, ranging fr om relatively short o bservational records dBatties 



1972 and 



Muir and El-Shaaraw^i 



2005a: 



Sterl and Caires 



Weisse and Giinthei , 



1986), satellite altimeters 



jCooper 



and Forristal] , 



199 



I 



Alves and Yound,l2003l:lyinoth and Youngl,l201ll '), long-teri n global reanalyses dCaires and Sterl , 



20071: 



S 



20051). regiona 



Aarnes et al. 



model hindcasts jWang and Swai] 



2OI2I ), and statistical downscaling JBreivik et al. 



2001 



2002 



2OO9I ). Here we explore a new approach to estimating return values of significant wave 
height using ensemble forecasts at advanced lead times inste ad of a time series. A simi- 
lar approach has been explored by lVan den Brink et al.l (|2005l ) for the special case of river 



flooding protection using seasonal forecas t ensembles from ECMWF's earlie r System 2 
Seasonal Forecast System (jAnderson et al.l. l2003l ). IVan den Brink et al.l (|2005l ) employed 
the entire seasonal forecasts from a lead time of one month up to six months, arguing 
that the modelled North Atlantic Oscillation (NAO) was only weakly correlated with 
observed NAO after one month, dropping further to essentially zero for the subsequent 
months. We employ a different approach where we instead extract the significant w^ave 
height for a fixed forecast time (+240 h) from the EPS version of the Integrated Forecast 
System (IFS) of ECMWF. We have gathered all forecasts at +240 h generated during the 
period 1999-2009, equivalent (as will be explained in Sec 121) to ~226 years if the data had 
formed a continuous time series. As will be explained in Sec|3]we assume that each fore- 
cast represents a six-hour interval, which is a reasonable assumption for a coarse model 
and analogous to the temporal resolution of traditional reanalyses. However, this also 
means that we estimate return values of the six-hourly average sea state. We address this 
in Sec |3] and discuss the implications further in Sec|5l 

The method to be explored allows us to utilize a vast unused resource of climate 
realizations and their lack of skill is actually a prerequisite since extreme value theory 
demands that events be uncorrelated. However, there are important caveats to the inter- 
pretation and use of the method. First, climate trends are by construct not captured by 
the method since we base our estimates on a time-slice of ~10 years. Likewise, quasi- 
cyclical phenomena like El Nino with a period longer than what is covered by the archive 
may influence the results. This suggests the following use and interpretation of EPS re- 
turn values: If probabilities of exceedance for the present time period are sought, then 
the ensemble data set is superior since it is not affected by long-term trends and low- 
frequency cycles. If on the other hand long-term return values are required then tech- 
niques for estimating extremes from time seri es with trends must be considered (see 



Kharin and Zwiers 



2000 . 



20051: 



Parey et al.ll2007n . or at least comparison with traditional 



time series covering a sufficiently long period. 

The paper is organized as follows. Sec |2] presents the observational records and the 
reanalysis and htndcast data sets used to test the method. Sec |3] presents the method 
used to compute return values from forecast ensembles at long lead times and how it 
differs from traditional return value estimates from observational records and modeled 



time series. We then investigate the independence of ensemble members and the clima- 
tology of the a rchived forecasts by c omparing against a model climatology from a recent 
reforecast (see lHagedorn et alJl2012r) and observations. Sec S] compares the return values 
found from the EPS with three reference model data sets, namely the reanalyses ERA-40 
and ERA-Interim (ERA-I hereafter) and a hi gh-resolutiori regional hindcast for th e Nor- 
wegian Sea and adjacent seas, NORAIO (see 



Reistad et al.ll2nil 



Aarnes et al. 



201J). Sec|5] 



discusses the differences in method and results, and points at possible weaknesses of the 
method. Finally, Sec [6] presents our conclusions on the general usefulness of the method 
and its application to significant wave height and the ECMWF EPS system. 

2 Modelled and Observed Wave Climate 



To assess the validity of our return value estimates from EPS forecasts, we will make a 
number of comparisons with observational records, reanalyses and hindcasts of signif- 
icant wave height. This section presents the observations used and the five model data 
sets (ERA-40, ERA-I, NORAIO, EPS and EPS reforecasts). We investigate the EPS cli- 
matology at analysis time (labeled EPSO) and at +240-h lead time (labeled EPS240) and 
assess the stationarity and independence of EPS240. Time series of all model data have 
been interpolated to buoy locations. 

Time series have been extracted from ERA-40, ERA-I, NORAIO and EPS and interpo- 
lated to the same 1° x 1° grid of the northeastern part of the Atlantic Ocean, the North 
Sea and the Norw^egian Sea in order to make geographical comparisons of extreme value 
estimates. The regridding and interpolation will inevitably smooth the field slightly. 
This will influence the return values somewhat. It is of interest to compare our EPS re- 
turn estimates with these reference data sets because all three archives (ERA-40, ERA-I 
and NORAIO) are freque ntly used for return value estimation (see e.g. ICaires and Sterl 



2005al: 



Aarnes et al.l 



20121) 



2.1 ERA-40 



Significant wave height from the ERA-40 reanalysis (|Uppala et al.l, 120051 ) is available for 
the period September 1957 to August 2002 on six-hourly temporal resolution. The atmo- 
spheric model was coupled to a deep-water version of the w^ave mode l (W AM) through 



exchange of a wave-modified Charnock parameter JTanssen et al 



2002 and 



TanssenI 



2004 



pp 232-234). WAM was run on a regular 1.5°-grid. At this resolution the Shetland 
and Faroe archipelagoes are not resolved and the modeled w^ave field on the lee side 
of these islands is co nsequently biased high. It is also we" 



Reistad et al. 



1 known that ERA-40 is bi- 
, l201lh . For this study we 



have not attempted any correction either to the time series themselves, which is how 
Caires and Sterll (|2005bl ') came up with the corrected semi-global fields referred to as 



ased low in general (jCaires and Steri l2005bl: 

T 
J 

the Corrected ERA-40, or b y correction of the 10 0-year return values, which is how 
Caires and Sterll (|2005ah and ISterl and CairesI (|2005h constructed the global maps of re- 
turn values from ERA-40. We argue that for this study it is better to compare the original 
data sets to avoid confounding artefacts o f the new approach wit h art efacts of the sta- 
tistical correction algorithms employed by lCaires and Sterll (|2005a|) and ICaires and Sterl 



of 



(l2005m. However, we do discuss how our results qualitatively correspond to the results 



Caires and Sterll (|2005al ) in SeclH 



2.2 ERA-Interim 

ERA-I is a continually updated coupled atmosphere-wave reanalysis which originally 
covered the period from 1989 (roug hly co-incident with the satellite era), but which has 



2007 



Uppala et al 



20081: 



Dee et al. 



recently been extended back to 1979 i Simmons et al. 

I 1 

120111 ). The resolution is 1.0° for the wave model at the equator, but the resolution is 

kept nearly constant towards the poles by the use of an irregular latitude-longitude grid. 
The wave model is coupled to the atmospheric model in the same fashion as outlined 
above for ERA-40, but the ERA-I wave model physics include shallow-water effects im- 
portant in areas like the southern North Sea. ERA-I also differs from ERA-40 in its use of 
a four-dimensional variational assimilation scheme and a substantially larger amount of 
observations, especially after 1991. ER A-I uses a su bgrid scheme to represent the down- 
stream impact of unresolved islands JBidlotl, l2012n . Though a clear improvement over 
ERA-40, the wave field in the lee of the Faroes and the Shetland Isles is still biased a little 
high. 



2.3 The NORAIO regional hindcast 

NORAIO, a recently completed atmospheric d ownscaling of ERA-40 and wave model 



hindcast on 10-11 km resolution, is described by lReistad et al.l (|201lh . The model domain 
covers the North Sea, the Norwegian Sea and the Barent Sea. The temporal resolution 
of the archived fields is three hours. The hindcast initially covered the ERA-40 period 
(September 1957 to August 2002), but an extension with boundary and initial fields from 
the ECMWF IFS has since been added. The hindcast archive is continually updated. 
The breach of stationarit y due to the change in boundary and initial values after August 



2002 w^as investigated by lAarnes et al.l (|2012l) and no statistically significant changes were 
found. The median and upper percentiles of NORAIO Hg show little bias and generally 
close correspondence with the wave observations used in this study. The model resolves 
the main coastal features and the archipelagoes in the Norwegian Sea. Like ERA-I the 
wave model is run in shallow water mode. 



2.4 ECMWF EPS archive 

We have extracted the significant wave height from archived operational ECMWF EPS 
wave forecasts for the period 1999-2009, a total of 11 years. The data set is not homoge- 
neous, i.e., the resolution and the model physics of the operational EPS forecast system 
have been continually upgraded (see Fig [T] for the most important changes affecting the 
wave field). The wave model has been coupled to the atmospheric model in the same 
fashion as for ERA-I. The data assimilation scheme has been upgraded several times 
during the archived period, and the amount of data entering the assimilation cycle has 
steadily increased. It is also important to note that the forecast systems started issuing 
two forecasts per day on 2003-03-25 (00 and 12 UTC analysis time). This means that 
the amount of data is not uniform over the period. We have extracted the analysis and 



the +240-h forecasts from the 50 perturbed ensemble members plus the control mem- 
ber (forced by unperturbed wind fields). We have also extracted the forecasts at +228 h 
(EPS228 hereafter). This data set is naturally slightly more correlated than EPS240 and is 
used here primarily to assess the validity of the method. 



2.5 EPS reforecasts 

Since 2008 every new model cycle of the EPS has been accompanied by a model clima- 
tology based on reforecasting a five-member ensemble of the current model cycle (four 
perturbed and one unperturbed member) from ERA-I initial conditioris every Thursday 



from 18 years back and up until the present da y (|Hagedorn et al 



20081: 



Hagedorn et al. 



20121: 



Hagedorn , 



20081: 



Prates and Buizzal,l201ll ). These reforecasts are run to 32 days, sim- 



ilar to the operational forecasts. We have extracted the reforecasts valid at +240 h from 
model cycle Cy36r4, in operation after the end of the archived period, from 60 locations 
in the north-east Atlantic, the Norwegian Sea and the North Sea. The reforecasts are 
in principle also useful for extreme value analysis, but unfortunately the data set is too 
small by two orders of magnitude to allow the kind of analysis attempted with the EPS 
forecasts: 



5 weekly members x 18 yr 



51 members x 2 daily forecasts x 7 weekdays x 11 yr 



0.011. 



(1) 



2.6 Comparing observed and modeled significant wave height 

Wave observations are routinely archive d and quality-cont rolled by ECMWF as part of 
the wave model intercompari son effort dBidlot et al.l, |2002| ). To make the observations 



comparable with model output lBidlot et al.l (|2002h averaged observations over four hours 
centered on the synoptic times. The rationale behind this averaging procedure is as fol- 
lows. Typical wind conditions in the open ocean are on the order of 10 ms^^. For fully 
developed wind sea the group speed, which dic tates the propagat i on speed across the 



2007: 



World Meteorological Organization 



model grid, will be comparable to the wind speed (|Holthuijsenl. l! 

19981 ). If the resolution is ~1.5° then the time interval that is represented by the model 



output is 4-6 h. Archived model values, although "instantaneous" in the sense that they 
are model output, are thus slowly changing and should be considered averages repre- 
sentative of intervals of 4-6 h in the case of the coarser archives discussed below (ERA-40, 
ERA-I and EPS). The NORAIO archive has much higher resolution (10-11 km) and is con- 
sequently also archived at three-hourly resolution. Both ERA-40 and ERA-I are archived 
on six-hourly resolution and the return values derived from these reanalyses should be 
interpreted as six-hourly averages of the significant wave height. EPS is of compara- 
ble resolution and we assign the same interval to the EPS240 forecasts. We discuss this 
further in Sec|5l 

Three locations with quite different wave climate were selected from a total of 60 
observation stations in the North East Atlantic, The Norwegian Sea and the North Sea 
for inspection of the relative performance of the reanalyses and the EPS: 

• P40 (Ekofisk oil field, WMO code LF5U, 56.50° N, 003.20° E) in the central North 
Sea 



P35 (Heidrun oil field, WMO LF3N, 65.30° N, 007.30° E) in the eastern Norwegian 
Sea 

B16 (K5 buoy, WMO 64045, 59.10° N, 011.40° W) in the north-east Atlantic 



3 Estimating return values from ensemble forecasts 

3.1 Extreme value distributions applied to ensemble forecasts 

The two most commonly used statistical methods for estimating return values are based 
on the GEV distribution for blocked maxima and the GP distribution for values ex- 
ceeding a set threshold. For complet eness we rep eat here the general form of the GEV 
and the GP distributions but refer to IColesI (|200l[) for a more in-depth discussion. The 



GEV distribution (GEVD) is an asymptotic limit for a distribution of blocked maxima 
Mn = max{Xi, X2, . . . , X„}. The method is routinely used to approximate the proba- 
bility distribution of blocked maxima such as annual maxima (jColesI I2OOII. pp 45-51). 
However, the method can also be used on ensembles of independent and identically dis- 
tributed (iid) forecasts since the blocking procedure itself makes no assumption of the 
grouping other than to ensure that all blocks have the same statistical properties. The 
blocking can thus be performed in many ways, but it is natural to block by ensemble 
member or some subset of time and member which is sufficiently large to ensure t hat the 



Coles 



GEVp is a reasonably good approximation to the parent distribution. Following 
(|200l|) the cumulative distribution function (CDF) of the block maxima formed from a 
random sequence of independent variables can be written 



G(z) = exp 



l + l 



IJ-n 



ay, 



-!/£■ 



(2) 



where o"„ is the scale parameter, ju„ is known as the location parameter, and E, is the 
so-called shape parameter. The GEV distribution contains as special cases the Frechet 
{E, > 0), Gumbel {£, = 0) and reversed WeibuU {£, < 0) distributions. The width of 
confidence i nterv als will depend strongly on the sign of the shape parameter ( Hoskingl. 



1984 



Colesl.l200lh . 



The GP method retains only values exceeding a threshold u. Th e transformed vari- 
able is written 1/ = X, — u, y > 0. It can be shown (|Colesl I2OOII. pp 75-77) that the 
GP distribution (GPD) is applicable if the data y are independent and the maxima M„ 
formed from the original variable belong to the GEV distribution, Eq (|2|. The GDP is 
written 



H(y) 



1 + ^ 



-yr 



a 



(3) 



Here o" = a + £,{u — \x) and ^ is the shape parameter found in Eq (O. For ^ = GP 
becomes an exponential distribution. In traditional studies of wind and wave extremes 
it is common to select only peaks separate d by at least 24-48 hours to ensure that the 



maxima represent individ ual storm events JLopatoukhin et al.l, I2OOOI: 



Aarnes et al. 



2OI2I: 



Naess and Clausenl, I2OOII) . This is known as the "Peaks-over-threshold" method (POT) 
and it is the method w^e apply to the ERA-40, ERA-I and NORAIO time series. Since we 
assume (see below) that the EPS forecasts are uncorrelated at +240 h the ensemble mem- 



bers represent independent events and we can retain all values exceeding the chosen 
threshold. In this case the return values are more properly referred to as GP threshold 
estimates rather than GP/POT estimates. 

3.2 Criteria for using ensembles for extreme value estimation 

The following assumptions must be shown to hold in order to estimate return values 
from ensemble forecasts: 

1. Each forecast is representative of a time interval (e.g. six hours) 

2. The model climatology distribution is comparable to the observed climatology dis- 
tribution 

3. No spurious trend in the mean and the variance due to model updates 

4. No significant correlation between ensemble members at advanced lead times 

If these conditions are met the ensemble can be assumed to be independent and identi- 
cally distributed and representative of the observed wave climate. We will address each 
of these conditions below. 

3.2.1 Estimating return periods from ensembles 

Turning M ensemble forecasts with N ensemble members each into the equivalent of a 
time period is necessary in order to convert from probability of exceedance to a return 
period. We assume each forecast to represent a six-hour interval based on the following 
reasoning. First, Af = 6 h matches the temporal resolution of the ERA-I and ERA-40 
archives, simplifying the comparison. Second, as discussed in Sec [2112.61 ), model fields 
are smoothly varying in time, making them representative of averages over typically 4-6 
hours at the resolution of ERA-40, ERA-I and EPS. This allows us to treat the collection 
of ensemble forecasts as an equivalent time period Tgq = MNAt. We discuss the validity of 
this assumption in Sec|5l 

3.2.2 Climatology of ensemble forecasts at advanced lead times 

To convince ourselves that the EPS240 dataset is identically distributed we need only 
look at the quantile-quantile (QQ) distribution of two members. Panel (c) of Fig |2] clearly 
demonstrates the similarity of the distributions of two randomly selected ensemble mem- 
bers (the statistics look essentially the same for all combinations of members, not shown). 
The QQ-plot against observations in panel (a) of Fig |2] makes it clear that the ensemble 
members at +240 h represent the observed climate well. In fact, the cumulative distri- 
bution of the ensemble at +240 h is somewhat better aligned with observations than at 
analysis time as can be seen in panel (b) of Fig |2] for the Ekofisk location in the North 
Sea, but the differences in distribution are small from analysis time to +240 h for all the 
observation locations and we conclude that we can assume that the ensemble members 
are identically distributed and that they represent the observed wave climate well. 

3.2.3 Stationarity of model climate 

To address the question of w^hether there is a spurious trend in the model climate over the 
archived period we compare against the reforecasts for the same period from Cy36r4. We 

8 



are only interested in the behavior of the +240 h forecasts as our objective is to investigate 
whether the changes in model physics and resolution, especially in the early days of the 
EPS forecast system, will have significant impact on our return value estimates. 

The EPS forecasts at lead time +240 h are compared with the reforecasts at the same 
lead time and at the same analysis dates over the period 1999-2009 in Fig |3l Panel (a) 
presents the time series of the annual mean deviations of the significant wave height 
at 60 locations in the north-east Atlantic, the North Sea and the Norwegian Sea. The 
annual mean difference in their standard deviation is shown in panel (b). It is clear from 
the boxplots that there is considerable deviation in both mean and standard deviation 
between the EPS forecasts and the EPS reforecasts throughout the model domain from 
one year to another, but there is no consistent drift in either statistic. However, it is 
clear that the reforecasts have slightly higher mean (~0.05 m) and standard deviation 
(~0.10m). 

3.2.4 Independence of ensemble forecasts at advanced lead times 

The motivation for using ensemble forecasts at long lead times is the anticipation that 
the upper percentile of the data will be found to be independen t. In other words, we are 



looking for forecasts with as little skill as possible (|Wilksl.l2006f ). With a set of ensemble 
forecasts at advanced lead time (e.g. 240 hours), the question is then whether the correla- 
tion between two arbitrary ensemble members is sufficiently Ioav for the members to be 
assumed independent. The residual correlation (after subtraction of the seasonal mean) 



betw een two fields can be investigated through the centered anomaly correlation (jWilks 



20061. pp 311-312) which for two ensemble members i and / can be defined as 



cov(x;,x;.) 

rACC = ^^, (4) 

where primes indicate centered anomalies. The anomaly correlation at all three locations 
is approximately 0.20, while the actual correlation varies from 0.46 in the open ocean 
(B16 and P35) to 0.35 at Ekofisk (P40) in the central North Sea. With small variations this 
is the level of anomaly correlations found throughout the domain studied here. 

Even weak correlations between ensemble members (if significant) may have a dele- 
terious effect on the equivalent or effective ensemble size. This is a well known problein 
often discussed in the context of autocorrelated time series (|von Storch and Zwierslll999l, 



pp 371-372). Denote the sample size (in our case the number of archived forecasts) M 
and the ensemble size N. The entire ensemble is X G M^^^. The ensemble variance- 
covariance matrix is w^ritten {eiej) G M^^^, where e; represent departures from the en- 
semble mean. If we assume all members to have equal variance (e,e;) = s^ and common 
correlation r (a reasonable assumption since there is nothing to distinguish one member 
from another) such that (e^ey) = rs^ (where we note that r = 1 when / = /) we arrive at 
the following relation for the variance of the ensemble mean. 



„2 

5 

N 



4=^(l + (N-l)r). (5) 



The effective sample size is now found from s| = s'^/N*, i.e., the variance of the mean of a 
smaller ensemble of uncorrelated members should equal that of our correlated ensemble. 



The effective ensemble size becomes 

N 

N* = (6) 

l + (N-l)r ^^ 

and it is clear that even quite weak correlations can seriously reduce the effective ensem- 
ble size and have a detrimental impact on the mean properties of the ensemble. How- 
ever, assessing the impact of correlations on the ensemble mean alone is of limited value 
since only the upper percentiles of the data set are actually used for the return value esti- 
mates. To investigate the possible impact of correlations on the tail of the data set w e first 



computed the correlation and the Spearman rank correlation (see lPress et al.lll992n for a 
subset of the forecasts where at least one ensemble member exceeded the 97 percentile 
(P97). Members not exceeding the threshold were set to zero. The average rank corre- 
lation and Pearson's correlation coefficients were 0.05 for this subset of forecasts. This 
show^s that the higher percentiles of the ensemble tend to be uncorrelated even if the 
ensemble itself exhibits weak correlation. This is not surprising given the nature of our 
analysis. We are selecting the upper percentiles from a large data set. This means that we 
are only selecting storm events, which are transient and fast-moving. It is unlikely that 
storm events exceeding P97 will occur simultaneously in many ensemble members after 
a 10-day integration. Average sea state will on the other hand be more correlated at long 
lead times since such weather patterns are less transient (e.g., high pressure situations). 
To assess the impact of any residual correlation on the return values we followed a 
heuristic approach suggested by M. Leutbecher (pers comm) where return values from N 
individual ensemble members are compared with return values from decimated subsets 
of similar sample size where all members are used, but only every N-th forecast. We 
thus arrive at two distributions of return values drawn from samples of the same size M. 
Splitting the total data set in N parts obviously increases the uncertainty associated with 
the return value estimates, but we are only interested in comparing the distributions of 
the tw^o sets of return values. Fig H] compares quantiles of the 51 member-wise return 
values (first axis) with the 51 return values from the decimated samples for location P40 
in the central North Sea. It is evident that the two distributions are very similar with only 
slightly higher standard deviation (1.83 m v 1.67 m) for the member-wise estimates. The 
average is practically the same (11.30 m). We conclude that the weak correlations found 
between ensemble members in the mean have no discernible impact on the expected 
value or the spread of the return estimates. 

4 Comparison of extreme value estimates and their 
confidence intervals 

Gridded estimates of the 100-year return value Hioo of the significant wave height were 
made from EPS240 interpolated to a 1.0° grid for the North Atlantic, the Norwegian 
Sea and the North Sea using both blocked maxima (GEV) and threshold exceedances 
(GP). Note that this is an extraction procedure and does not reflect the underlying model 
resolution, which as Fig [T] shows has increased over the archived period. Ice-infested 
locations, i.e., locations where the modeled ice concentration ever exceeds 30%, have 
been removed from the analysis. 
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We start by looking at the differences we can expect from the existing reanalyses and 
hindcasts available to us. Fig |5] shows the difference between return values estimated 
from ERA-40 and ERA-I (panel a) and similarly between ERA-40 and NORAIO (panel 
b) using GP (POT) with a threshold of 97%. The differences between ERA-40 and ERA-I 
(panel a) are moderate in the open ocean, ~1 m, but the enhanced resolution of ERA-I 
is clearly visible on the lee side of the Faroe and Shetland archipelagos. Also, ERA-40 
yields higher estimates in the central North Sea due to the deep-water physics scheme 
employed by the ERA-40 WAM model. Panel (b) shows th e difference between ERA-40 
and NORAIO. As reported earlier by Uarnes et al.l (|2012l ), NORAIO consistently esti- 



mates higher return values than ERA-40 (3-4 m higher in the northern North Atlantic), 
except along the open boundary to the south and west where the model is forced with 
ERA-40 boundary values. Note also that the central North Sea exhibits th e smallest dif- 
ferenc e, since the ERA-40 deep-water physics yields higher values here. 



(I2OI1I ) 



Reistad et al. 



I looked at the upper percentiles of the NORAIO wave height distribution in a num- 
ber of locations in the North Sea and the Norwegian Sea and found good agreement with 
buoy measurements. It is clear that the differences seen around the Faroes and Shetland 
in Fig |5] are due to resolution issues partly solved by ERA-I and more fully resolved by 
NORAIO. In the open ocean ERA-40 and ERA-I seem to be in agreement while NORAIO 
estimates significantly higher return valu es (on the order of 3 -4 m in the northern North 



Aarnes et al. 



2012). 



Atlantic, consistent with what is found by 

Fig[6]presents the return values for EPS240 using GEV (panel a) and GP (panel b). The 
differences are very small. The blocking for GEV w^as performed per ensemble member 
to minimize the effect of the varying resolution and the numerous model upgrades in- 
troduced throughout the archived period (cf Fig[l]). The GP threshold was set at P97, but 
varying this threshold d id n ot affect the return valu es considerably (results not shown). 
Caires and Sterll (l2005al ) and ISterl and CairesI (120051 ) find that ERA-40 when calibrated 
against buoy data yields return values on the order of 24 m in the northern North At- 
lantic; higher but much closer to our findings than the uncorrected ERA-40 return values. 

Fig [7| presents the difference between the GP 100-year return values found for EPS240 
and those found for ERA-I (left panel) and NORAIO (right panel) using GP with a P97 
threshold. It is clear that EPS240 predicts considerably higher return values than what 
is found for ERA-I. The differences approach 5 m in the North Atlantic, and through- 
out the Norwegian Sea we see differences on the order of 2-3 m. The differences be- 
tween NORAIO and EPS240 are smaller (panel b), but here we see significant differences 
throughout the domain that must be considered separately. First, the influence of ERA- 
40 on NORAIO is visible along the open boundary in the south-west, so the boundary 
zone should not be taken into account in this analysis. Second, EPS240 does not properly 
resolve the Faroes and the Shetland archipelagos. The middle North Sea is biased low^, 
which is probably also a resolution issue. Away from these areas we see that the agree- 
ment is generally good, except in the northern p art of the Norwegian Sea, w^here EPS240 



is up to 3 m lower than the NORAIO estimates. lAarnes et al.l (|2012l ) discusses the large 
impact of an individual storm event on the NORAIO estimates in thi s area and w^e note 
that the confidence intervals are exceptionally w^ide here (14-22 m, see 
Fig 3). 



Aarnes et al, 



2012, 
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4.1 Bootstrapping confidence intervals 

The GEV shape parameter ^ in Eg Q arid its counterpart fo r GPD in Eq (O determine the 
width of confidence intervals (|Hoskingl.ll984l:IColes . 2001). Th e signi ficant wave height 



from the NORAIO hindcast has been shown by lAarnes et alJ 112012]) to exhibit a wide 
range of extreme value shape parameters within the Norwegian Sea and the adjacent 
seas with correspondingly varied confidence intervals. 

We have estimated confidence inter yals for EPS240 and ERA-I using a bootstrapping 
technique similar to that employed by lAarnes et alJ ||2012J). For ERA-I w^hich represents 



a traditional time series we have made 100 random draws w^ith replacement from the 
POT data (see Fig [T]). In the case of EPS240, we have similarly made random draws 
from the tail of the dataset exceeding the 97 percentile (note that this is technically not 
peate-over-threshold since the EPS240 data are considered independent. 

The upper limits of the confidence intervals found for ERA-I and EPS240 are shown 
in FiglHl panels (a) and (b). The differences are pronounced. First, the confidence interval 
is much tighter for EPS240 (panel d), ranging from less than 1 m in the sheltered parts 
of the North Sea to approximately 2 m in the open ocean (relative width 5-10% of the 
return values). ERA-I (panels a and c) has confidence intervals up to 5 m (relative width 
30% of the return values) in the north-east Atlantic. Second, the spatial variability of the 
confidence intervals is very low for EPS240, while the ERA-I intervals vary substantially 
throughout the domain due to sensitivity to individual storm events. 

It is important to stress that even though the confidence intervals become much 
tighter with a larger data set, the bootstrapping method does not account for model 
bias. The bias must be assessed by comparing the observed and modeled wave height 
distributions, see Sec l3ll3.2| ). We discuss the impact of model bias further in Sec|5l 

5 Strengths and limitations to the method 

Estimating return values from ensembles at advanced lead times is a new technique, 
and the assumptions underlying the method have been outlined in Sec |3l Here we dis- 
cuss some of the perceived weaknesses of the method in general and how applicable the 
method appears to be for significant wave height. The main caveats to be aware of when 
using the technique on archived EPS forecasts in general are: 

1. Spurious trends caused by model upgrades 

2. Upper-percentUe biases 

3. Conversion to an equivalent time period 

4. Correlations within the ensemble 

5. Return value estimates in a changing climate 

Although we do not find evidence of any spurious trend in the mean and the variance of 
the significant w^ave height at +240 h (see Fig|3ll, we are aware that the IPS model updates 
over the past decade have led to an apparent increase in the 10-m wind speed at analysis 
time. S. Abdalla (pers comm) quantified this effect at analysis time to be about 29 cm s^\ 
i.e., the earlier analyses were biased low. The wave height will be somewhat affected by 
this, but it is thought to have a small effect on the extremes of waves found at advanced 
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lead times, especially since some of the removed bias stems from changes to the data 
assimilation and will fade as the model integration becomes dynamically balanced at 
advanced lead times. The effect is also evident from inspection of Fig|2]where the wave 
height at analysis time (panel b) is seen to be biased low. Since this bias disappears for 
EPS240 (panel a), we believe that the model updates have had only a modest impact on 
the wave climatology at advanced lead times. 

We have investigated the robustness of the return value estimates by also looking at 
EPS228. We select the maximum from each pair of EPS228 and EPS240 since the +228 
and +240-h forecasts are strongly correlated (see Fig|9l). This combined data set is now 
assumed equivalent to 2 x 226 years. The combined 100-year GP return value estimates 
(indicated by blue circles) fall between the 100-year return values from the the two data 
sets (EPS228, green circles, and EPS240, red circles), which is w^hat we expect w^hen going 
to larger data sets. This suggests that even larger data sets may be built by selecting 
maxima from longer forecast sequences. However, care must be taken to avoid getting 
too close to the beginning of the forecast where the ensemble members are correlated. 

We have also looked at the possible sources of bias to the extremes from EPS forecasts. 
Such a bias can not be estimated from a bootstrap procedure. Instead we have compared 
the return values of the ERA-40, ERA-I and EPS240 against NORAIO which has been 



shown to represent the upper percentiles well (JReistad et al. 



2011 



Aarnes et al. 



2012h . 



Biases can enter a model data set in two distinct ways. The fi rst is through poor repre- 



Aarnes 



etal.ll2012h . 



ERA- 



sentation of physical processes. For example (as pointed out byL 
40 applies deep-water physics in areas where this is questionable (such as the southern 
North Sea). If a bias is due to poor model physics or poor model resolution then the bias 
should also be different from one region to another. The second way biases can enter a 
data set is through poor spatial and temporal sampling of the model fields. For example, 
both ERA-40 and ERA-I are archived with six-hourly resolution and will consequently 
miss some modeled storm maxima, even if a coarse model as discussed in Sec [2112.61 ) is 
slowly varying and representative of significant wave height averaged over 6 h. The data 
sets are also typically interpolated in space, leading to further reduction of extremes. 
This means that the return values from EPS and coarse resolution reanalyses such as 
ERA-40 and ERA-I should be interpreted as return values of the six-hourly average sea 
state, as discussed in Sec l2ll2.6D , and will thus generally be lower than those found from 
a high-resolution hindcast such as NORAIO where the model values represent shorter 
time intervals. Fig [10] shows how the return values ranging from Hi, . . . , Hioo line up. 
It is clear that ERA-40 and ERA-I due to their negative bias give significantly lower re- 
turn values than NORAIO, especially in the open-ocean conditions in the North Atlantic 
(panel b). EPS240 on the other hand matches the return values better, and the shorter 
return periods also line up well here. For Ekofisk (panel a) in the central North Sea the 
situation is different. Here the shorter return periods match well, but due to the relatively 
coarse resolution of EPS240 Hioo seems to converge to approximately the same value as 
ERA-40 and ERA-I (11.3 m). NORAIO yields 100-year return values closer to 13 m for 
this location. It seems likely that the EPS240 return values are biased low in enclosed 
seas and should consequently be used with some care in such areas. The EPS240 return 
values are close to NORAIO estimates in the open ocean and we conclude that the bias 
from spatial and temporal interpolation is of less importance, since otherw^ise the return 
values should be depressed everywhere. 
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Lack of forecast skill at advanced lead times is an important requirement since the 
ensemble members must be assumed uncorrelated to be considered independent draws 
from the model climate. We have shown that the w^eak correlations in the mean are not 
present in the tail of the distribution in the case of significant wave height (see SeclSj. 
However, it seems likely that the method is not equally applicable to the investigation 
of the extremal behavior of param eters represen ting large-scale features, e.g. the North 
Atlantic Oscillation (NAO) index (|Hurrelll, 119951) , or long-term (seasonal, say) averages. 
Here we do expect the ensemble forecast system to retain skill at advanced lead times, 
and indeed forecast ski ll in reproducing large-scale features is the rationale behind sea- 



sonal forecast systems dStockdale et al 



to six months i Van den Brink et al. 



1998, 



201 ih . where the lead time typically goes 



20051 ). We therefore find it prudent to advice against 



employing the method on large-scale spatial averages or long-term temporal averages. 
It is also clear that the forecasts only differ from the initial conditions by as much as 
240-h integrations allow and will still be influenced by the slow components of the earth 
system, like the Arctic ice cover. This means that for parameters influenced by climate 
change or w^here quasi-cyclical phenomena with long-periodic components such as the 
El Nino, Southern Oscillation (ENSO) are present we must be careful when assessing 
the return values since we must be convinced that the archive covers a sufficiently long 
period to capture all the stages of the phenomenon. As noted in Sec [H under such cir- 
cumstances non-stationary techniques employed on traditional time series and climate 
projections will be more relevant. If on the other hand return values valid for the present 
period are sought then ensemble forecasts are superior. 

6 Conclusion 



Return values estimated from long lead-time ensemble forecasts have been investigated 
and found to yield good results. The immediate advantage is clear; a huge data set of 
forecasts are readily available from the ECMWF archive. The method yields return val- 
ues of significant wave height that are comparable to what is found from NORAIO but 
significantly higher than what is found from ERA-I and ERA-40. This result was not 
totally unexpected since it is known that ERA-I and especially ERA-40 tend to under- 
estimate the upper percentiles of the wave height distribution. The EPS estimates are 
probably too low in enclosed seas (see Fig [TOl l. Although we have only investigated 
the extremes in the North Atlantic, the North Sea and the Norwegian Sea, it appears 
likely that the extrer ne value estimates found from ERA-40 and E RA-I are too low glob- 



ally (as discussed by lCaires and Sterl2005al: 



Sterl and Caires 



20051 i n the case of ERA-40). 



How^ever, the return value estimates from NORAIO (jAarnes et al.l. 120121) and the present 
finding s su ggest that the corrected ERA-40 return estimates reported by lCaires and Sterl 
pOOSafl and lsterl and CairesI QOOS) are too high. 

Return value estimation from large ensembles at advanced lead times is a general 
method which should be applicable to a wide range of atmospheric and oceanographic 
variables if the conditions discussed in Sec |3] and Sec |5] are met. It is clear that the EPS 
archive represents an unused resource which complements and perhaps yields more pre- 
cise return values than traditional reanalyses and hindcasts. 
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Figure 1: A timeline of major updates to the operational EPS. The upper arrow illustrates 
changes to the atmospheric component of the integrated forecast system (IFS) while the 
middle arrow lists the major changes to the wave model (WAM). Note that only one forecast 
per day (00 UTC) was issued before 2003-03-25. The most important changes related to the 
wave model are the changes in resolution which affect the areas in the lee of the Faroes and 
the introduction of shallow-water physics which mainly affects the southern North Sea. 
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Figure 2: Panel a: The correlation of all 51 ensemble member forecasts at +240 h with 
observations of significant wave height at location P40 (Ekofisk, central North Sea) over the 
whole period 1999-2009. The QQ curve is shown in green. It is clear that the +240 h climate 
is quite similar to the observed climate in this location and better than the wave height 
distribution found at analysis time (b). Panel b: Same as panel (a) for analysis time. The 
EPS is biased low at analysis time. Panel c: The correlation between the +240 h forecasts 
of two ensemble members (member represents the unperturbed atmospheric integration) 
over the whole period 1999-2009. The QQ-curve is shown in green. The centered anomaly 
correlation relative to the weekly observed wave climate is 0.20. 
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Figure 3: Panel a: Time series (1999-2009) of the difference in annual mean significant wave 
height [m] of EPS forecasts at lead time +240 h and the reforecast at the same lead time from 
model cycle Cy36r4 (operational in May 2011) at 60 locations in the north-east Atlantic, the 
North Sea and the Norwegian Sea. Panel b: The difference in annual standard deviation 
[m] of the significant wave height of EPS240 and the reforecast. Considerable variations are 
found from one year to another, but no significant drift due to model upgrades is seen. 
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Figure 4: Comparison of the quantiles of Hioo (100-year return values) from 51 ensemble 
members over the whole period 1999-2009 v the quantiles of Hioo estimates from 51 dec- 
imated samples of all ensemble members taken every 51 forecasts. This decimation gives 
51 estimates of same sample size as the member-wise estimates. The distributions are very 
similar with common averages (11.30 m) and standard deviations of 1.83 m and 1.67 m, 
respectively. 



22 



Discrepancy in 100-year Hs based on POT {threshold: 97.0) 
ERA40-ERAi [m] 



Discrepancy in 1Q0-year Hs based on POT (threslnold: 97.0) 
ERA40-NORA10[m] 



(a) 




(b) 




Figure 5: Panel a: Difference of Hioo of ERA-40 and ERA-I using the GP (POT) method with 
a threshold of 97%. The differences are ^1 m in the open ocean, but behind the Faroes and 
the Shetland isles ERA-40 yields higher return values. This is to be expected from a coarser 
reanalysis. The effect of the deep-water wave physics employed by the ERA-40 WAM model 
also yields higher return values in the North Sea. The model fields are interpolated to a 
common grid with 1.0° resolution, and all ice-infested grid points have been excluded. The 
three buoy locations are marked as P40 (Ekofisk), B16 (K5) and P35 (Heidrun). Panel b: 
Difference of ERA-40 and NORAIO, same threshold as for panel (a). The difference between 
NORAIO and ERA-40 is consistently ^3 m in the open ocean, but approaches zero at the 
open boundary where ERA-40 provided the boundary values. 
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Figure 6: Panel a: Gridded estimates of Hioo from EPS240 using GEV with blocked maxima 
from individual ensemble members. The grid is 1.0°, and all ice-infested grid points have 
been excluded. Panel b: Same as panel (a) but for GP with a threshold of 97% of the data. 
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Figure 7: Panel a: Difference between Hioo estimates of EPS240 and ERA-I using the GP 
distribution with a threshold of 97% (positive means EPS240 is higher). EPS240 consistently 
predicts higher return values throughout the domain. The differences are ^3 m in the open 
ocean, with the North Atlantic approaching a difference of 5 m. The differences are smallest 
in the central North Sea. The grid is 1.0°, and all ice-infested grid points have been excluded. 
Panel b: Difference between EPS240 and NORAIO, same GP threshold as for panel (a). The 
difference between NORAIO and EPS240 is generally smaller than what was found in panel 
(a) for ERA-I, but significant geographical differences exist. Near the south-western bound- 
ary NORAIO is influenced by ERA-40, and behind the Faroe and Shetland archipelagos the 
resolution of EPS240 is too coarse to provide a meaningful comparison. 
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Figure 8: Panel a: Upper limit of 95% confidence interval for ERA-I Hioo based on 100 
bootstraps of the POT exceeding the 97 percentile. Panel b: Upper limit of 95% confidence 
interval for EPS240 Hioo based on 100 bootstraps of the data exceeding the 97 percentile. 
Panel c: Width of 95% confidence interval for ERA-I. The relative width reaches 30% of the 
return values in parts of the north-east Atlantic. The geographic variability is pronounced, 
largely due to influence from individual storm events. Panel d: Width of 95% confidence 
interval for EPS240. The relative width varies from 5% in sheltered areas to 10% in the open 
ocean. The geographic variability is very low. 
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Figure 9: Panel a: Probability of non-exceedance for the EPS forecasts in location P40 
(Ekofisk, central North Sea). The GPD estimates are shown as curves (with 100-year values 
marked as circles) and are based on a 97% threshold while the individual forecast values 
are shown as asterisks. Green and red indicate EPS228 and EPS240, respectively. Blue is a 
combined estimate for EPS228 and EPS240 where the maximum of each pair of EPS228 and 
EPS240 forcast is chosen. This is done because the two forecasts are separated by only 12 h 
and strongly correlated. The combined data set thus represents the equivalent of 452 years 
of data since EPS228 and EPS240 each represents the equivalent of 226 years. The combined 
data set lies below the EPS228 and EPS240 on the vertical axis since it has a lower probabil- 
ity of non-exceedance due to being twice the size of EPS228 and EPS240. Panel b: Same as 
panel (a) but for location B16 in the eastern North Atlantic (note that the upper three val- 
ues of EPS228 are masked by EPS240 as the values are almost identical. Panel c: Same as 
panel (a) but for location P35 in the eastern Norwegian Sea (Heidrun). It is evident from all 
three panels that the combined 100-year estimate is bracketed by the estimates from the two 
individual estimates, as expected from a larger dataset. 
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Figure 10: Panel a. "Comet plot" comparison of Hi, H2, . . . , Hioo return values in the central 
North Sea (P40, Ekofisk) for NORAIO (first axis) against ERA-40 (red asterisks), ERA-I (blue) 
and EPS240 (green). All return estimates were made from the GP distribution with a 97% 
threshold. Panel b. Same as panel (a) but for location B16 in the eastern North Atlantic. 
Panel c. Same as panel (a) but for location P35 in the eastern Norwegian Sea (Heidrun). 
ERA-40 and ERA-I estimates are significantly lower than NORAIO in all three locations. 
EPS240 shows good correspondence in open-ocean conditions over the whole range up to 
Hioo (panels b and c), while in the North Sea the upper return values are substantially lower 
and closer to the ERA-40 and ERA-I estimates. 
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